Automating the Detection of Anomalies and Trends from Text
نویسنده
چکیده
Scalable and robust nonnegative matrix factorization (NMF) algorithms and software are needed for the generation of feature vectors from text corpora. By preserving nonnegativity, the NMF facilitates a sum-of-parts representation of the underlying term usage patterns in textual data. Both training and test sets of documents can be parsed and then factored by the NMF to produce a reduced-rank representation of an entire document space. The resulting feature and coefficient matrix factors are then used to cluster documents. Recent studies with documents from the Aviation Safety Reporting System (ASRS) have shown that (known) anomalies of training documents can be directly mapped to NMF-generated feature vectors. Dominant features (tracking words or sentences) of test documents can then be used to generate anomaly relevance scores for those documents.
منابع مشابه
TDPA: Trend Detection and Predictive Analytics
Text mining is the process of exploratory text analysis either by automatic or semi-automatic means that helps finding previously unknown information. Text mining is a highly interdisciplinary research area, bringing together research insights from the fields of data mining, natural language processing, machine learning, and information retrieval. The amount of textual data available is too hug...
متن کاملFast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies
Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...
متن کاملThe effect of estimation methods on fractal modeling for anomalies’ detection in the Irankuh area, Central Iran
This study aims to recognize effect of Ordinary Kriging (OK) and Inverse Distance Weighted (IDW) estimation methods for separation of geochemical anomalies based on soil samples using Concentration-Area (C-A) fractal model in Irankuh area, central Iran. Variograms and anisotropic ellipsoid were generated for the Pb and Zn distribution. Thresholds values from the C-A log-log plots based on the e...
متن کاملThermal anomalies detection before earthquake using three filters (Fourier, Wavelet and Logarithmic Differential Filter), A Case Study of two Earthquakes in Iran
Earthquake is one of the most destructive natural phenomena which has human and financial losses. The existence of an efficient prediction system and early warning system will be useful for reducing effects of destroying earthquake. In this research, the soil temperature time-series data, obtained from three meteorological station, using three filters (Fourier, Wavelet and Logarithmic Different...
متن کاملThe detection of 11th of March 2011 Tohoku's TEC seismo-ionospheric anomalies using the Singular Value Thresholding (SVT) method
The Total Electron Content (TEC) measured by the Global Positioning System (GPS) is useful for registering the pre-earthquake ionospheric anomalies appearing before a large earthquake. In this paper the TEC value was predicted using the singular value thresholding (SVT) method. Also, the anomaly is detected utilizing this predicted value and the definition of the threshold value, leading to the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007